Cleanup Docker image handling, ECS service management, and model training resources by forstmeier · Pull Request #793 · oscmcompany/fund

forstmeier · 2026-03-16T01:05:45Z

Overview

Changes

reorganize/consolidate Docker image handling
cleanup GitHub workflows ECS/data management
update AWS/Pulumi resource naming
consolidate TiDE model training resouces

Context

Some additional fixes given the new Prefect pattern.

Summary by CodeRabbit

Chores
- Standardized service names with "-server" suffix and aligned image/repository naming.
- Consolidated model-training components into server/worker roles and switched build/push semantics to package_name.
- Runbook updated for automated environment secret population.
New Features
- Added in-tool deploy command and a matrix-based deploy workflow for package+stage deployments.
Behavior Changes
- Removed automated scheduled data-sync and switched teardown to manual-only runs.
- Model training deployments limited to weekdays with a configured timezone.

…ning resources

coderabbitai · 2026-03-16T01:06:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Consolidates CI build/push into an application+stage matrix with a separate deploy_images job, removes scheduled sync/teardown triggers, renames trainer/training → runner/model-trainer across infra/storage/compute/exports, refactors TiDE training into workflow.train_model, and updates Maskfile, Dockerfiles, tests, and docs.

Changes

Cohort / File(s)	Summary
CI/CD workflows `.github/workflows/launch_infrastructure.yaml`, `.github/workflows/teardown_infrastructure.yaml`, `.github/workflows/sync_data.yaml`	Converted launch matrix from `service`→`application`+`stage`, renamed build job to include `application-stage`, added `deploy_images` job (matrix deploy), removed scheduled triggers and deleted the separate `sync_data` workflow.
Mask & build tooling `maskfile.md`, `.github/workflows/launch_infrastructure.yaml`	Switched build/push command signatures from `application_name`→`package_name`, added commit-hash existence check/ECR login, caching adjustments, and a new `deploy (package_name) (stage_name)` command; updated service mapping to `-server`/`-worker` names.
Infrastructure exports & entrypoint `infrastructure/__main__.py`	Renamed/re-exported ECR outputs and image URIs from trainer→runner and training→model-trainer-server-worker; updated training API host to `model-trainer-server` and removed legacy training_worker_image export.
Storage / ECR repositories `infrastructure/storage.py`, `libraries/python/tests/test_infrastructure_storage.py`	Replaced `tide-trainer`→`tide-runner` and `training-server/worker`→`model-trainer-server-worker` repositories; adjusted lifecycle resource names and image URI variables; updated tests to assert new lifecycle names.
Compute resources & naming `infrastructure/compute.py`	Mass renames and refactors: `training`→`model-trainer`, many services/tasks gain `-server` suffix, updated target groups, service discovery, RDS/ElastiCache names, env vars, secrets, and dependency refs to match new model-trainer naming.
Models: TiDE training consolidation `models/tide/src/tide/workflow.py`, `models/tide/src/tide/trainer.py`	Removed legacy `trainer.py`; introduced `DEFAULT_CONFIGURATION` and `train_model` in `workflow.py` (deferred imports) and updated call sites to use the new workflow entrypoint.
Models: scheduling & Dockerfiles `models/tide/src/tide/deploy.py`, `models/tide/tests/test_deploy.py`, `models/Dockerfile`, `models/tide/Dockerfile`	Changed TiDE scheduled cron to weekdays and added `timezone="America/New_York"`; renamed Docker build stages (`worker`→`server-worker`, `trainer`→`runner`); tests adjusted accordingly.
Models: tests updates `models/tide/tests/test_trainer.py`, `models/tide/tests/test_workflow.py`	Removed legacy `test_trainer.py`; expanded `test_workflow.py` to exercise new `train_model`/`DEFAULT_CONFIGURATION` behavior; updated tests to patch/train via `workflow.train_model`.
Docs / runbook & tooling `infrastructure/github_environment_runbook.md`, `.claude/commands/update-pull-request.md`	Runbook removed manual Pulumi secret extraction and added PULUMI_ACCESS_TOKEN guidance; PR tooling moved many update-pull-request flows from GraphQL→REST, refactored scratchpad files and posting/mutation logic.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Developer
    participant GHA as "GitHub Actions"
    participant Matrix as "workflow matrix"
    participant Mask as "Mask / Flox"
    participant ECR as "AWS ECR"
    participant ECS as "AWS ECS"
    Developer->>GHA: push / workflow_dispatch
    GHA->>Matrix: expand application+stage matrix
    Matrix->>Mask: run build (package, stage)
    Mask->>ECR: check image by commit-hash
    alt image exists
        Mask-->>Matrix: skip build
    else image missing
        Mask->>Mask: build image (package-stage)
        Mask->>ECR: push image (package-stage)
    end
    GHA->>GHA: deploy_images job triggered (matrix)
    GHA->>ECS: deploy service using matrix.application/matrix.stage
    ECS->>ECS: wait for service stability
    GHA->>Developer: workflow complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add fixes from manual local testing #760: Overlaps on launch_infrastructure workflow and infra export/matrix changes touching the same files.
Parallelize launch infrastructure steps and cleanup comments #642: Related rework of the launch_infrastructure matrix and separate deploy job (build/push → matrix-driven, deploy split).
Codebase reorganization and cleanup #792: Large refactor with overlapping renames (trainer→runner, training→model-trainer) across infra, ECR, and CI configuration.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the primary changes: Docker image handling reorganization, ECS service management updates, and model training resource consolidation across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docker-image-handling-updates

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

greptile-apps · 2026-03-16T01:09:23Z

Greptile Summary

This PR consolidates Docker image handling for TiDE model training (server + worker now share a single ECR repo and Dockerfile), renames ECS services/task families/log groups to a consistent -server/-worker suffix convention, removes the automated scheduled launch/teardown triggers in favour of push-only and manual-only runs, and moves per-service ECS deployments into a dedicated deploy_images workflow job rather than the Pulumi stack-up script.

Key changes:

Single shared models/Dockerfile (renamed from tools/Dockerfile) replaces two separate ECR repos (fund/training-server, fund/training-worker) with one fund/model-trainer-server-worker repo.
New mask infrastructure image deploy command triggers aws ecs update-service --force-new-deployment and waits for stabilisation per service, replacing the inline bash loop that was inside the Pulumi stack-up script.
TiDE trainer.py deleted; train_model inlined into workflow.py with deferred tinygrad imports kept intact.
Critical: The server ECS task definition (training_server_task_definition) sets only command (Docker CMD), not entryPoint. Because models/Dockerfile declares an ENTRYPOINT pointing to the Prefect worker command, Docker will append the command value as arguments to the worker ENTRYPOINT rather than replacing it. The Prefect server will never start; the service will crash-loop and fail its ALB health check.
ECR lifecycle policy countNumber corrected from 0 to 1 day (a 0-day threshold expired untagged images immediately on push).

Confidence Score: 1/5

Not safe to merge — the Prefect server ECS service will crash-loop on every deployment until the missing entryPoint override is added.
The consolidation of server + worker into a single Docker image is architecturally sound, but the server's ECS container definition does not override the image's ENTRYPOINT (which is the Prefect worker command). ECS command maps to Docker CMD and is appended to the ENTRYPOINT, so the server container will run the worker process with garbage arguments and immediately error out. This breaks the model-trainer-server ECS service in production.
infrastructure/compute.py (server task definition around line 495) and models/Dockerfile (ENTRYPOINT declaration) need to be addressed together to fix the server startup issue.

Important Files Changed

Filename	Overview
infrastructure/compute.py	Renames all ECS services, task families, log groups, and service-discovery entries to the new "-server"/"-worker" convention; consolidates server+worker into one shared image URI — but the server task definition is missing an `entryPoint` override, causing the Prefect server to fail to start (see P0 comment).
infrastructure/storage.py	Reverts S3 buckets from `BucketV2` back to `Bucket`, fixes ECR repo names to use hyphens consistently, consolidates two separate training ECR repos into a single `fund/model-trainer-server-worker` repo, and fixes the lifecycle policy `countNumber` from 0 to 1 (preventing immediate expiry of untagged images).
models/Dockerfile	Renamed from `tools/Dockerfile`; renames the final stage from `worker` to `server-worker` and copies `models/tide/` into the build. The `ENTRYPOINT` remains the Prefect worker command, which conflicts with the server ECS task definition that only sets `command` (see P0 comment on compute.py).
.github/workflows/launch_infrastructure.yaml	Removes scheduled launch trigger and adds a matrix-based `deploy_images` job that runs per-service ECS `force-new-deployment` after Pulumi completes; adds `model-trainer` as a build/push matrix entry and splits server/worker as separate deploy entries.
maskfile.md	Renames `images` → `image` and `services` → `service` subcommands, adds a new `deploy` command that runs per-service ECS deployments, and fixes the `build` command to map hyphenated package names to underscore directory names when resolving Dockerfile paths.
models/tide/src/tide/workflow.py	Absorbs the deleted `trainer.py` module by inlining `train_model` directly into `workflow.py` with deferred imports for tinygrad-heavy dependencies; updates the `train_tide_model` task to call the local function instead of the removed module import.
models/tide/src/tide/deploy.py	Restricts the TiDE training schedule to weekdays only (`0 22 * * 1-5`) and adds `timezone="America/New_York"` so the cron fires at 10 PM ET rather than UTC.

Comments Outside Diff (1)

infrastructure/compute.py, line 495-507 (link)

Server starts worker process instead of Prefect server

models/Dockerfile sets the image-level ENTRYPOINT to the worker command:
```
ENTRYPOINT ["uv", "run", "--package", "tools", "prefect", "worker", "start", "--pool", "training-pool", "--type", "process"]
```
The server container definition here only overrides command (Docker CMD), not entryPoint. In Docker/ECS, command is appended to the existing ENTRYPOINT rather than replacing it. The server container would therefore execute:
```
uv run --package tools prefect worker start --pool training-pool --type process bash -c "..."
```
prefect worker start would receive bash and -c "..." as extra positional arguments and error out. The Prefect server would never start, causing the ECS service to crash-loop and the ALB health check on port 4200 to fail.

The fix is to add an entryPoint override that resets the Docker ENTRYPOINT so the command can run bash -c "..." directly:
```
"entryPoint": ["bash", "-c"],
"command": [
    (
        "export PREFECT_API_DATABASE_CONNECTION_URL="
        '$(python3 -c "'
        "import os, urllib.parse;"
        "p=urllib.parse.quote(os.environ['PREFECT_DB_PASSWORD'],safe='');"
        f"print(f'postgresql+asyncpg://prefect:{{p}}@{args[1]}/prefect')"
        '")'
        " && prefect server start --host 0.0.0.0"
    )
],
```
Alternatively, change models/Dockerfile to use CMD instead of ENTRYPOINT for the default worker command so ECS command can fully replace it.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: infrastructure/compute.py
Line: 495-507

Comment:
**Server starts worker process instead of Prefect server**

`models/Dockerfile` sets the image-level `ENTRYPOINT` to the worker command:

```
ENTRYPOINT ["uv", "run", "--package", "tools", "prefect", "worker", "start", "--pool", "training-pool", "--type", "process"]
```

The server container definition here only overrides `command` (Docker CMD), not `entryPoint`. In Docker/ECS, `command` is **appended** to the existing `ENTRYPOINT` rather than replacing it. The server container would therefore execute:

```
uv run --package tools prefect worker start --pool training-pool --type process bash -c "..."
```

`prefect worker start` would receive `bash` and `-c "..."` as extra positional arguments and error out. The Prefect server would never start, causing the ECS service to crash-loop and the ALB health check on port 4200 to fail.

The fix is to add an `entryPoint` override that resets the Docker ENTRYPOINT so the `command` can run `bash -c "..."` directly:

```python
"entryPoint": ["bash", "-c"],
"command": [
    (
        "export PREFECT_API_DATABASE_CONNECTION_URL="
        '$(python3 -c "'
        "import os, urllib.parse;"
        "p=urllib.parse.quote(os.environ['PREFECT_DB_PASSWORD'],safe='');"
        f"print(f'postgresql+asyncpg://prefect:{{p}}@{args[1]}/prefect')"
        '")'
        " && prefect server start --host 0.0.0.0"
    )
],
```

Alternatively, change `models/Dockerfile` to use `CMD` instead of `ENTRYPOINT` for the default worker command so ECS `command` can fully replace it.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "Add fixes after manu..."}

Copilot

Pull request overview

This PR refactors model-training and infrastructure tooling to consolidate Docker/ECR handling, align ECS/Pulumi naming, and simplify Prefect-related workflows/resources.

Changes:

Moves TiDE train_model (+ configuration) into tide.workflow and relocates tests accordingly.
Renames/standardizes infrastructure resources (ECR repos, ECS services, service discovery names) around the new “model-trainer” and “*-server” naming.
Reworks maskfile.md and GitHub Actions to build/push/deploy images via unified commands and updated service naming.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
models/tide/tests/test_workflow.py	Adds `train_model` tests and updates patch targets after moving training logic.
models/tide/tests/test_trainer.py	Removes redundant trainer-module tests (now covered in workflow tests).
models/tide/tests/test_deploy.py	Updates expected Prefect cron schedule string.
models/tide/src/tide/workflow.py	Introduces `DEFAULT_CONFIGURATION` + `train_model` directly in workflow module; removes trainer import in flow.
models/tide/src/tide/trainer.py	Deletes legacy trainer module (functionality moved).
models/tide/src/tide/deploy.py	Updates Prefect deployment schedule and sets a timezone.
models/tide/Dockerfile	Renames CUDA stage from `trainer` to `runner`.
models/Dockerfile	Renames the final stage to `server-worker`.
maskfile.md	Consolidates image commands, adds ECS deploy command, updates model deploy/train/download commands.
libraries/python/tests/test_infrastructure_storage.py	Updates assertions for renamed ECR lifecycle resources.
infrastructure/storage.py	Renames ECR repositories and consolidates model-trainer to a single repo.
infrastructure/github_environment_runbook.md	Simplifies runbook instructions (bootstrap now sets secrets).
infrastructure/compute.py	Renames ECS/ALB/SG/RDS/Redis resources and switches server/worker to shared model-trainer image URI.
infrastructure/main.py	Updates Pulumi exports for renamed ECR resources and new Prefect API URL.
.github/workflows/teardown_infrastructure.yaml	Removes scheduled teardown trigger.
.github/workflows/sync_data.yaml	Removes repository-dispatch-based data sync workflow.
.github/workflows/launch_infrastructure.yaml	Updates matrix to build/push/deploy per app-stage and removes schedule + sync trigger.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

infrastructure/compute.py (1)
363-380: ⚠️ Potential issue | 🔴 Critical

This renames the live Prefect database instead of migrating it.

Changing both the Pulumi logical resource name (from "prefect_database" to "model_trainer_state") and the RDS identifier (from "fund-prefect" to "fund-model-trainer-state") creates a new instance, not an in-place rename. The old database fund-prefect will be orphaned because Pulumi will attempt to delete it, but Pulumi's own deletion_protection=True flag on the new resource does not affect the old resource. However, depending on how Pulumi handles this drift, you risk losing the existing Prefect state or creating deployment failures. Please migrate using a snapshot/restore process or revert the logical name and identifier back to their original values to perform an in-place update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infrastructure/compute.py` around lines 363 - 380, You renamed the Pulumi
logical resource and the RDS identifier (new resource model_trainer_state with
identifier "fund-model-trainer-state"), which causes Pulumi to create a new RDS
instance and orphan the existing "fund-prefect" DB; to fix, either revert the
Pulumi logical name and identifier back to the original values (restore the
resource name prefect_database and identifier "fund-prefect") so Pulumi will
perform an in-place update, or implement an explicit snapshot/restore migration:
take a snapshot of the existing instance, restore that snapshot into the desired
new instance identifier, import/replace the Pulumi resource state as needed, and
ensure deletion_protection/backups are configured so the original instance isn’t
inadvertently destroyed; update the code around model_trainer_state, identifier,
and any references to the original prefect_database/fund-prefect to match the
chosen approach.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infrastructure/compute.py`:
- Around line 476-480: The server and worker task definitions both use the
shared variable model_trainer_server_worker_image_uri, causing image race
conditions; change the deployment to use distinct image identifiers (e.g.,
model_trainer_server_image_uri and model_trainer_worker_image_uri) or pin
immutable digests for each stage and replace
model_trainer_server_worker_image_uri in container_definitions and the other
occurrence (the similar container_definitions block around the second mention)
with the appropriate per-role image variable so server and worker always get the
correct image.

In `@maskfile.md`:
- Around line 213-214: The ECS waiter timeout is too short (Delay: 10,
MaxAttempts: 6 → 60s); update the aws ecs wait services-stable command to allow
up to ~5 minutes by increasing the waiter-config (e.g., keep "Delay": 10 and set
"MaxAttempts": 30 or set "Delay": 20 and "MaxAttempts": 15) so the ecs wait
services-stable --cluster "$cluster" --services "$service" uses the longer
timeout for stabilization.

In `@models/tide/src/tide/workflow.py`:
- Around line 41-45: The train_model signature currently types configuration as
plain dict | None; tighten it by declaring and using a more specific
configuration type (e.g., a TypedDict named ModelConfiguration or at minimum
dict[str, Any]) and update the parameter type to ModelConfiguration | None (or
dict[str, Any] | None) to match the shape seen in DEFAULT_CONFIGURATION; update
any imports (typing.TypedDict/Any) and adjust usages in train_model to use the
new type alias so type checkers can validate keys referenced inside the
function.

In `@models/tide/tests/test_deploy.py`:
- Around line 25-27: The test is missing an assertion for the new timezone
parameter: update the assertions in the test (where call_kwargs,
call_kwargs["parameters"], cron and LOOKBACK_DAYS are checked) to include assert
call_kwargs["parameters"]["timezone"] == "America/New_York" so the deployed
job's timezone is validated alongside cron, base_url, and lookback days.

In `@models/tide/tests/test_workflow.py`:
- Around line 171-179: The two tests test_train_model_returns_model_and_data and
test_train_model_uses_default_configuration call train_model() without
overriding epoch_count and thus depend on production defaults; update these
tests to either inject a cheap test configuration (e.g., pass epoch_count=1 or a
TestConfig object) into train_model or stub/mock the model.fit path so training
is cheap and deterministic—look for train_model, epoch_count and any internal
fit/training call to patch or parameterize, and make the same change for the
other affected tests (the ones around the 201-207 range) so all unit tests pin a
minimal training budget.

---

Outside diff comments:
In `@infrastructure/compute.py`:
- Around line 363-380: You renamed the Pulumi logical resource and the RDS
identifier (new resource model_trainer_state with identifier
"fund-model-trainer-state"), which causes Pulumi to create a new RDS instance
and orphan the existing "fund-prefect" DB; to fix, either revert the Pulumi
logical name and identifier back to the original values (restore the resource
name prefect_database and identifier "fund-prefect") so Pulumi will perform an
in-place update, or implement an explicit snapshot/restore migration: take a
snapshot of the existing instance, restore that snapshot into the desired new
instance identifier, import/replace the Pulumi resource state as needed, and
ensure deletion_protection/backups are configured so the original instance isn’t
inadvertently destroyed; update the code around model_trainer_state, identifier,
and any references to the original prefect_database/fund-prefect to match the
chosen approach.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d146e9a-23e4-490e-93b7-9bce689ed45b

📥 Commits

Reviewing files that changed from the base of the PR and between 9418e89 and eecab19.

📒 Files selected for processing (17)

.github/workflows/launch_infrastructure.yaml
.github/workflows/sync_data.yaml
.github/workflows/teardown_infrastructure.yaml
infrastructure/__main__.py
infrastructure/compute.py
infrastructure/github_environment_runbook.md
infrastructure/storage.py
libraries/python/tests/test_infrastructure_storage.py
maskfile.md
models/Dockerfile
models/tide/Dockerfile
models/tide/src/tide/deploy.py
models/tide/src/tide/trainer.py
models/tide/src/tide/workflow.py
models/tide/tests/test_deploy.py
models/tide/tests/test_trainer.py
models/tide/tests/test_workflow.py

💤 Files with no reviewable changes (4)

.github/workflows/teardown_infrastructure.yaml
.github/workflows/sync_data.yaml
models/tide/src/tide/trainer.py
models/tide/tests/test_trainer.py

Fix five issues identified in code review: - maskfile build+push: use ${AWS_REGION:-} instead of ${AWS_REGION} so set -euo pipefail does not abort before the friendly error check when the variable is unset - maskfile build+push: override repository_name to fund/model-trainer-server-worker when package_name is model-trainer, fixing mismatch between the single shared ECR repo in infrastructure and the fund/${package_name}-${stage_name} derivation that would produce non-existent repo names - maskfile build: translate hyphens to underscores via tr when resolving the Dockerfile path for application services, fixing broken path for data-manager, portfolio-manager, and ensemble-manager whose directories use underscores - CI workflow: change model-trainer matrix paths from the comma-separated string models/Dockerfile,models/__init__.py (treated as a single literal glob by dorny/paths-filter, never matching) to models/** - workflow.py: tighten configuration type hint from dict | None to dict[str, Any] | None per project type hint guidelines - test_deploy.py: add assertion for timezone parameter - test_workflow.py: pass epoch_count=1 to two tests that previously ran the full 20-epoch production schedule, keeping unit tests cheap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

maskfile.md (1)

220-221: ⚠️ Potential issue | 🟡 Minor

Increase ECS waiter timeout to avoid flaky deploy failures.

Line 220 and Line 221 currently allow only ~60 seconds, which is commonly insufficient for ECS service stabilization during rolling updates.

⏱️ Suggested timeout adjustment

 aws ecs wait services-stable --cluster "$cluster" --services "$service" \
-    --waiter-config '{"Delay": 10, "MaxAttempts": 6}'
+    --waiter-config '{"Delay": 15, "MaxAttempts": 20}'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@maskfile.md` around lines 220 - 221, The ECS waiter timeout is too short for
service stabilization; update the aws ecs wait services-stable invocation (the
command using "$cluster" and "$service") to increase the waiter-config
Delay/MaxAttempts values so the total wait is much longer (e.g., Delay 15 and
MaxAttempts 40 for ~10 minutes) to avoid flaky deploy failures during rolling
updates; modify the waiter-config JSON accordingly to reflect the new Delay and
MaxAttempts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@models/tide/src/tide/workflow.py`:
- Around line 26-38: DEFAULT_CONFIGURATION is exported as a regular mutable dict
which can be accidentally mutated; replace it with an immutable mapping (e.g.,
wrap the literal with types.MappingProxyType or expose a
get_default_configuration() that returns a shallow copy) so callers cannot
change module-level defaults. Update the top-level symbol DEFAULT_CONFIGURATION
to be the immutable MappingProxyType of the existing dict (or replace usages to
call get_default_configuration()), import types.MappingProxyType if needed, and
ensure any tests or code that previously mutated DEFAULT_CONFIGURATION instead
obtain a mutable copy via dict(DEFAULT_CONFIGURATION) or the getter.

---

Duplicate comments:
In `@maskfile.md`:
- Around line 220-221: The ECS waiter timeout is too short for service
stabilization; update the aws ecs wait services-stable invocation (the command
using "$cluster" and "$service") to increase the waiter-config Delay/MaxAttempts
values so the total wait is much longer (e.g., Delay 15 and MaxAttempts 40 for
~10 minutes) to avoid flaky deploy failures during rolling updates; modify the
waiter-config JSON accordingly to reflect the new Delay and MaxAttempts.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aab16c18-6084-48fe-bf77-573c8021d82a

📥 Commits

Reviewing files that changed from the base of the PR and between eecab19 and a570618.

📒 Files selected for processing (5)

.github/workflows/launch_infrastructure.yaml
maskfile.md
models/tide/src/tide/workflow.py
models/tide/tests/test_deploy.py
models/tide/tests/test_workflow.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.claude/commands/update-pull-request.md (1)
32-74: ⚠️ Potential issue | 🟠 Major

Add strict error handling to all bash blocks to catch silent failures in jq transformations.

Blocks 1 and 3 execute external commands (gh api, jq) without error checking. Block 2 validates Block 1's output, but Block 3's jq transformations produce files with no downstream validation. If jq fails on malformed input or file I/O issues, the output files are empty or invalid, and downstream steps (lines 164+, 335–336) proceed silently with corrupted data.

Add set -euo pipefail at the start of each bash block (lines 32, 78, and 103) to exit immediately on command failure.
Suggested fix
   ```bash
+  set -euo pipefail
   # Set up scratchpad directory with session isolation
   : "${TMPDIR:=/tmp}"
   ```bash
+  set -euo pipefail
   SCRATCHPAD="<value printed above>"
   
   if ! command -v jq >/dev/null 2>&1; then
   ```bash
+  set -euo pipefail
   SCRATCHPAD="<value printed above>"
   
   echo "Extracting structured data from raw responses..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/update-pull-request.md around lines 32 - 74, The script
currently runs gh and jq invocations (e.g., gh api calls that write
pr_meta_raw.json, pr_comments_raw.json, pr_reviews_raw.json,
pr_review_comments_raw.json, check_runs_raw.json and the INLINE_QUERY GraphQL
call that writes thread_ids_raw.json) without strict failure handling; add set
-euo pipefail at the top of each bash block that performs external commands (the
scratchpad creation block that sets SCRATCHPAD and the blocks that call gh/jq
and that perform jq transformations) so the script exits on errors/undefined
variables and failed pipes, and also ensure any jq transformations that write
derived files validate input (e.g., check files exist and non-empty after jq
runs) before proceeding; update blocks around SCRATCHPAD creation, the gh api
fetches, and the jq processing/INLINE_QUERY usage accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/update-pull-request.md:
- Around line 307-330: The script currently uses hardcoded placeholder thread
IDs in the threads list and calls urllib.request.urlopen without a timeout or
error checking; update it to read real thread IDs from the scratchpad file
instead of the hardcoded "threads" list, add a timeout to the network call (use
the timeout parameter on urllib.request.urlopen), and validate the GraphQL
response by checking for an "errors" key and HTTP error conditions before using
data; also wrap the token retrieval, network call (urllib.request.urlopen) and
JSON parsing in try/except to log/print failures and continue processing other
thread IDs so one error doesn’t hang or abort the whole script.
- Around line 56-60: The REST gh api calls that save PR data (the commands
writing to "${SCRATCHPAD}/pr_meta_raw.json",
"${SCRATCHPAD}/pr_comments_raw.json", "${SCRATCHPAD}/pr_reviews_raw.json", and
"${SCRATCHPAD}/pr_review_comments_raw.json") only fetch the first page; add the
--paginate flag to each of those gh api invocations so all pages are retrieved.
For the GraphQL query that extracts review threads (the block capped at 100),
implement cursor-based pagination (use endCursor/hasNextPage and loop until
hasNextPage is false) so all threads are fetched, and when using jq to write
combined outputs, add --slurp to produce a single JSON array to avoid fragmented
output. Ensure the same --paginate and pagination approach is applied to the
other REST/GraphQL review/comments fetches referenced in the file (including the
review-threads section).

---

Outside diff comments:
In @.claude/commands/update-pull-request.md:
- Around line 32-74: The script currently runs gh and jq invocations (e.g., gh
api calls that write pr_meta_raw.json, pr_comments_raw.json,
pr_reviews_raw.json, pr_review_comments_raw.json, check_runs_raw.json and the
INLINE_QUERY GraphQL call that writes thread_ids_raw.json) without strict
failure handling; add set -euo pipefail at the top of each bash block that
performs external commands (the scratchpad creation block that sets SCRATCHPAD
and the blocks that call gh/jq and that perform jq transformations) so the
script exits on errors/undefined variables and failed pipes, and also ensure any
jq transformations that write derived files validate input (e.g., check files
exist and non-empty after jq runs) before proceeding; update blocks around
SCRATCHPAD creation, the gh api fetches, and the jq processing/INLINE_QUERY
usage accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 30d71002-9f0d-4c39-ac89-07d5835a35b5

📥 Commits

Reviewing files that changed from the base of the PR and between a570618 and b5a9f30.

📒 Files selected for processing (1)

.claude/commands/update-pull-request.md

Copilot

Pull request overview

This PR consolidates model-training code and modernizes infrastructure/image handling by standardizing service naming, ECR repositories, and ECS deployment workflows (aligned with the newer Prefect pattern).

Changes:

Moved TiDE train_model (and its default configuration) into tide.workflow, updating tests accordingly and removing tide.trainer.
Standardized ECS/ECR naming (notably “*-server” suffixes) and consolidated model-trainer images into a single ECR repository.
Updated automation: adjusted Prefect deployment schedule (weekdays + timezone), removed scheduled infra teardown/data sync workflows, and added a CLI deploy command for ECS services.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`models/tide/src/tide/workflow.py`	Adds `DEFAULT_CONFIGURATION` + `train_model` to workflow module; removes deferred import from old trainer.
`models/tide/src/tide/trainer.py`	Removes the old training module (now consolidated into workflow).
`models/tide/tests/test_workflow.py`	Updates patching target and adds unit tests for `train_model` now in `tide.workflow`.
`models/tide/tests/test_trainer.py`	Removes trainer-focused tests (moved into workflow tests).
`models/tide/src/tide/deploy.py`	Updates Prefect deployment schedule (weekday cron + timezone).
`models/tide/tests/test_deploy.py`	Aligns tests with updated cron + timezone.
`maskfile.md`	Reworks image build/push semantics around `package_name`, adds `image deploy`, and updates model commands.
`.github/workflows/launch_infrastructure.yaml`	Updates CI to build/push and now attempt ECS deploy per matrix entry.
`.github/workflows/teardown_infrastructure.yaml`	Removes scheduled teardown (manual only).
`.github/workflows/sync_data.yaml`	Removes scheduled data-sync workflow.
`infrastructure/storage.py`	Renames ECR repos/lifecycle policies; consolidates model-trainer repo naming.
`infrastructure/compute.py`	Renames ECS resources, service discovery names, log groups, and Prefect state/broker resources.
`infrastructure/__main__.py`	Updates Pulumi exports to match new repo/service names and URLs.
`libraries/python/tests/test_infrastructure_storage.py`	Updates assertions for renamed lifecycle policy resources.
`models/Dockerfile`	Renames build stage to `server-worker`.
`models/tide/Dockerfile`	Renames CUDA stage from `trainer` to `runner`.
`infrastructure/github_environment_runbook.md`	Documents automated population of env secrets via bootstrap.
`.claude/commands/update-pull-request.md`	Updates operational guidance to prefer REST + Python over MCP GitHub tools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…resolve script, Pulumi stack selection - Add --paginate flag to REST array endpoint fetches in update-pull-request.md, piping through jq -s 'add' to flatten multi-page responses into a single array - Replace hardcoded placeholder thread IDs in Python resolve script with automatic loading from thread_ids.json in scratchpad; add timeout=30 and GraphQL error checking - Add pulumi org get-default + pulumi stack select pattern to mask infrastructure image deploy command, matching the existing convention used in other mask commands - Add PULUMI_ACCESS_TOKEN env var to the deploy step in launch_infrastructure.yaml so Pulumi is authenticated when the deploy command runs in CI Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/launch_infrastructure.yaml:
- Around line 74-83: The pre-stack ECS service deploy step (the step named
"Deploy ${{ matrix.application }}-${{ matrix.stage }}" that runs "mask
infrastructure image deploy") can fail if services or infra are renamed; reorder
the workflow so the Pulumi stack operation ("mask infrastructure stack up" run
in the launch_infrastructure job) happens before any per-service deploys, i.e.
make the "launch_infrastructure" job run immediately after build/push and have
the matrix/service deploy steps depend on its completion (or move "mask
infrastructure image deploy" into a job that lists launch_infrastructure as a
prerequisite), ensuring "mask infrastructure image deploy" only runs after "mask
infrastructure stack up".
- Around line 4-7: The workflow currently triggers on every push to master;
narrow the trigger by adding on.push.paths to only infra/service-relevant
patterns (e.g. infrastructure/**, pulumi/**, deploy/**, charts/**,
scripts/pulumi/**) and/or on.push.paths-ignore for docs/metadata (e.g. **/*.md)
so the Pulumi job only runs for relevant changes; update the YAML's on.push
block (keeping branches: [master]) to include these path patterns to reduce
unnecessary full infra runs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 638dec68-1539-4040-8435-6934342165a8

📥 Commits

Reviewing files that changed from the base of the PR and between b5a9f30 and 74b9232.

📒 Files selected for processing (3)

.claude/commands/update-pull-request.md
.github/workflows/launch_infrastructure.yaml
maskfile.md

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/launch_infrastructure.yaml:
- Around line 98-154: The deploy_images job duplicates the matrix and path
filter definitions; consolidate by extracting the matrix and path-filter config
into YAML anchors (e.g., define a top-level anchor like &matrix-config for the
matrix entries and &service-paths for the paths list) and then reference them in
deploy_images (strategy.matrix: <<: *matrix-config) and in the Check for service
changes step (use the anchored list for the filters input), ensuring job name
deploy_images and the paths-filter step id changes still consume the anchored
values; alternatively create a reusable workflow that accepts the matrix as an
input and have both deploy_images and build_and_push_images call it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebca9af7-8cb3-46d3-bf49-c9db9995a547

📥 Commits

Reviewing files that changed from the base of the PR and between 74b9232 and edfb386.

📒 Files selected for processing (1)

.github/workflows/launch_infrastructure.yaml

Copilot

Pull request overview

This PR standardizes Docker/ECR/ECS naming and deployment mechanics across the repo, consolidates TiDE training code into the workflow module, and updates CI/runbooks to match the new infrastructure + Prefect patterns.

Changes:

Consolidates TiDE model training logic into tide.workflow and migrates unit tests accordingly.
Standardizes ECR repo names / ECS service names (notably adding consistent -server suffixes and consolidating model-trainer images).
Reworks mask infra commands and GitHub Actions workflow steps for build/push/deploy, while removing scheduled data-sync and infra scheduling.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
models/tide/tests/test_workflow.py	Adds coverage for `train_model` now located in `tide.workflow`; fixes patch target.
models/tide/tests/test_trainer.py	Removes trainer-focused tests after moving training logic.
models/tide/tests/test_deploy.py	Updates expected Prefect deployment schedule (weekdays + timezone).
models/tide/src/tide/workflow.py	Introduces `DEFAULT_CONFIGURATION` + `train_model` and removes `tide.trainer` dependency from the flow.
models/tide/src/tide/trainer.py	Removes the standalone trainer module (training now in workflow).
models/tide/src/tide/deploy.py	Deploys Prefect schedule with weekday cron and explicit timezone.
models/tide/Dockerfile	Renames build stage to `runner` to match “tide runner” semantics.
models/Dockerfile	Renames stage to `server-worker` for combined model-trainer image.
maskfile.md	Consolidates image build/push logic, adds ECS deploy command, updates model commands and service naming.
libraries/python/tests/test_infrastructure_storage.py	Updates assertions for new lifecycle policy resource names.
infrastructure/storage.py	Renames ECR repos/lifecycle policies; consolidates model-trainer to a single repo.
infrastructure/github_environment_runbook.md	Updates runbook to reflect bootstrap-driven secret population.
infrastructure/compute.py	Renames ECS services, SD names, log groups, and Prefect state/broker resources to new naming scheme.
infrastructure/main.py	Updates Pulumi exports for renamed repos/images and new service discovery hostname.
.github/workflows/teardown_infrastructure.yaml	Removes scheduled teardown (now manual only).
.github/workflows/sync_data.yaml	Removes the sync-data workflow.
.github/workflows/launch_infrastructure.yaml	Updates build/push/deploy flow and matrix naming; removes schedule/manual triggers.
.claude/commands/update-pull-request.md	Updates automation guidance to use REST + Python for thread resolution reliability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

This PR cleans up build/deploy and infrastructure naming, consolidates TiDE model-training code into the workflow module, and updates CI/CD + tooling to match the new ECS/model-trainer layout and Prefect scheduling pattern.

Changes:

Consolidate TiDE training logic/tests by moving train_model + defaults into tide.workflow and updating tests accordingly.
Rename/standardize infrastructure resources (ECR repos, ECS services, service discovery names) and update workflows/tooling to build/push/deploy by package_name + stage_name.
Adjust operational automation: remove scheduled sync/teardown workflows and change training deployment to weekday-only with an explicit timezone.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`models/tide/tests/test_workflow.py`	Adds `train_model` unit tests and updates patching/imports after trainer consolidation.
`models/tide/tests/test_trainer.py`	Removes trainer-specific tests (moved into workflow tests).
`models/tide/tests/test_deploy.py`	Updates expectations for weekday cron + timezone in Prefect deployment.
`models/tide/src/tide/workflow.py`	Adds `DEFAULT_CONFIGURATION` + `train_model` implementation; removes import dependency on removed trainer module.
`models/tide/src/tide/trainer.py`	Removes standalone trainer module (logic moved into workflow).
`models/tide/src/tide/deploy.py`	Updates Prefect deployment schedule (weekday-only) and sets timezone.
`models/tide/Dockerfile`	Renames the CUDA stage target from `trainer` to `runner`.
`models/Dockerfile`	Renames stage target from `worker` to `server-worker` for the consolidated model-trainer image.
`maskfile.md`	Renames/expands image commands (build/push/deploy), standardizes naming, and updates Pulumi import hints.
`libraries/python/tests/test_infrastructure_storage.py`	Updates infra snapshot expectations for renamed ECR lifecycle resources.
`infrastructure/storage.py`	Changes S3 bucket resource types/names and standardizes ECR repository/lifecycle resources.
`infrastructure/github_environment_runbook.md`	Updates runbook to reflect automated secret population via `mask ... --bootstrap`.
`infrastructure/compute.py`	Renames ECS services/log groups/service discovery and refactors Prefect backing resources to “model-trainer-*”.
`infrastructure/__main__.py`	Updates exported outputs for renamed ECR resources and training API URL.
`.github/workflows/teardown_infrastructure.yaml`	Removes scheduled teardown; keeps manual-only runs.
`.github/workflows/sync_data.yaml`	Removes automated data sync workflow.
`.github/workflows/launch_infrastructure.yaml`	Reworks to matrix build/push + separate deploy step; removes scheduled launch.
`.claude/commands/update-pull-request.md`	Updates internal PR-update instructions to prefer REST + `gh api` and Python for GraphQL mutations.

Comments suppressed due to low confidence (2)

infrastructure/compute.py:383

Renaming the RDS instance resource and changing identifier will force a brand-new database to be created (the old one cannot be renamed in-place), which risks downtime/data loss for Prefect unless there’s an explicit migration plan. If the intent is a logical rename only, keep the existing identifier and add ResourceOptions(aliases=[...]) to adopt the prior Pulumi resource; if a new DB is intended, document the migration/backup steps and how the new secret ARN will be wired into consumers.

# RDS PostgreSQL for Prefect database
model_trainer_state = aws.rds.Instance(
    "model_trainer_state",
    identifier="fund-model-trainer-state",
    engine="postgres",
    engine_version="14",
    instance_class="db.t3.micro",
    allocated_storage=20,
    db_name="prefect",
    username="prefect",
    manage_master_user_password=True,
    db_subnet_group_name=prefect_rds_subnet_group.name,
    vpc_security_group_ids=[prefect_rds_security_group.id],
    skip_final_snapshot=False,
    final_snapshot_identifier=f"fund-model-trainer-state-final-{pulumi.get_stack()}",
    backup_retention_period=7,
    storage_encrypted=True,
    deletion_protection=True,
    tags=tags,
)

infrastructure/compute.py:426

Changing the ElastiCache cluster_id (and Pulumi logical name) will replace the Redis cluster. If existing environments depend on the current broker state/endpoint, add aliases and keep the existing cluster_id, or document the migration and ensure all clients (Prefect server/worker) are updated to the new endpoint as part of the same rollout.

# ElastiCache Redis for Prefect messaging
model_trainer_broker = aws.elasticache.Cluster(
    "model_trainer_broker",
    cluster_id="fund-model-trainer-broker",
    engine="redis",
    engine_version="7.0",
    node_type="cache.t3.micro",
    num_cache_nodes=1,
    subnet_group_name=prefect_elasticache_subnet_group.name,
    security_group_ids=[prefect_redis_security_group.id],
    tags=tags,
)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nsitive dependency

Copilot

Pull request overview

This PR consolidates Docker image handling and deployment workflows, renames/standardizes ECS + ECR/Pulumi resources (notably the model-trainer services), and moves TiDE training logic into the workflow module while adjusting Prefect deployment scheduling.

Changes:

Reworked Mask + GitHub Actions workflows for build/push/deploy, removed scheduled sync/teardown automation, and added an explicit ECS deploy command.
Renamed and consolidated AWS/Pulumi resources (ECR repos, ECS services, service discovery, logs, Prefect DB/Redis) to consistent *-server / model-trainer-* naming.
Moved TiDE train_model + default config into tide.workflow, updated tests, and adjusted training deployment schedule to weekdays with an explicit timezone.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`uv.lock`	Adds `pytz` to the lockfile dependency set.
`models/tide/tests/test_workflow.py`	Updates workflow tests and adds coverage for the relocated `train_model`.
`models/tide/tests/test_trainer.py`	Removes trainer tests now that training logic moved out of `tide.trainer`.
`models/tide/tests/test_deploy.py`	Updates expected Prefect cron schedule + timezone.
`models/tide/src/tide/workflow.py`	Introduces `DEFAULT_CONFIGURATION` + `train_model` and removes import from the deleted trainer module.
`models/tide/src/tide/trainer.py`	Deletes the old trainer module (training logic relocated).
`models/tide/src/tide/deploy.py`	Deploys Prefect flow on weekdays and sets timezone.
`models/tide/Dockerfile`	Renames the CUDA stage to `runner` (aligns with “tide runner” usage).
`models/Dockerfile`	Renames the runtime stage to `server-worker` for the consolidated model-trainer image target.
`maskfile.md`	Consolidates image commands under `infrastructure image`, adds `deploy`, updates Pulumi imports, and adds model command routing.
`libraries/python/tests/test_infrastructure_storage.py`	Updates infra snapshot expectations for new ECR repo lifecycle resource names.
`infrastructure/storage.py`	Updates S3 bucket resources and standardizes ECR repository naming + lifecycle resources.
`infrastructure/notifications.py`	Documents Pulumi config variable controlling monthly budget.
`infrastructure/github_environment_runbook.md`	Updates runbook to reflect automated env-secret population via `mask ... --bootstrap`.
`infrastructure/compute.py`	Renames ECS services/SD/log groups and consolidates server+worker to a single model-trainer image URI; tweaks ALB config.
`infrastructure/__main__.py`	Updates Pulumi stack outputs to reflect renamed repos/images and training API hostname.
`infrastructure/Pulumi.production.yaml`	Raises `monthlyBudgetLimitUsd` config value.
`applications/portfolio_manager/pyproject.toml`	Adds `pytz` dependency.
`applications/ensemble_manager/Dockerfile`	Copies `tools/` into the image (supports shared tooling usage).
`applications/data_manager/Dockerfile`	Sets `CARGO_BUILD_JOBS=4` during the build stage.
`.github/workflows/teardown_infrastructure.yaml`	Removes scheduled teardown; keeps manual workflow dispatch.
`.github/workflows/sync_data.yaml`	Removes the sync-data workflow.
`.github/workflows/launch_infrastructure.yaml`	Switches to a matrix build/push + deploy workflow; removes schedule trigger.
`.claude/commands/update-pull-request.md`	Updates internal PR-update automation guidance to prefer REST + Python for thread resolution.

Comments suppressed due to low confidence (1)

models/Dockerfile:24

The server-worker stage is now the canonical build target for the model-trainer image, but it hardcodes an ENTRYPOINT that starts a Prefect worker. Since the ECS model-trainer server task definition uses this same image and tries to override behavior via command, the server container may not start correctly (ECS command won’t replace ENTRYPOINT). Consider using CMD (so ECS can override), or provide separate server and worker targets/images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Cleanup Docker image handling, ECS service management, and model trai…

eecab19

…ning resources

Copilot AI review requested due to automatic review settings March 16, 2026 01:05

github-project-automation Bot added this to Overview Mar 16, 2026

github-project-automation Bot moved this to In Progress in Overview Mar 16, 2026

github-actions Bot requested a review from chrisaddy March 16, 2026 01:05

github-actions Bot added python Python code updates markdown Markdown code updates yaml YAML code updates labels Mar 16, 2026

github-actions Bot assigned forstmeier Mar 16, 2026

Copilot started reviewing on behalf of forstmeier March 16, 2026 01:06 View session

greptile-apps Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread maskfile.md

Comment thread infrastructure/storage.py

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread maskfile.md Outdated

Comment thread maskfile.md

Comment thread maskfile.md

Comment thread maskfile.md Outdated

Comment thread maskfile.md

Comment thread .github/workflows/launch_infrastructure.yaml Outdated

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread infrastructure/compute.py

Comment thread maskfile.md Outdated

Comment thread models/tide/src/tide/workflow.py

Comment thread models/tide/tests/test_deploy.py

Comment thread models/tide/tests/test_workflow.py

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread models/tide/src/tide/workflow.py

Refactor "update-pull-request" custom command

b5a9f30

Copilot AI review requested due to automatic review settings March 16, 2026 02:16

Copilot started reviewing on behalf of forstmeier March 16, 2026 02:16 View session

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread .claude/commands/update-pull-request.md Outdated

Comment thread .claude/commands/update-pull-request.md

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread models/tide/src/tide/workflow.py

Comment thread .github/workflows/launch_infrastructure.yaml Outdated

Comment thread maskfile.md Outdated

Comment thread infrastructure/compute.py

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread .github/workflows/launch_infrastructure.yaml

Comment thread .github/workflows/launch_infrastructure.yaml Outdated

Add bot pull request feedback

edfb386

Copilot AI review requested due to automatic review settings March 16, 2026 02:57

Copilot started reviewing on behalf of forstmeier March 16, 2026 02:57 View session

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread .github/workflows/launch_infrastructure.yaml

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread .github/workflows/launch_infrastructure.yaml Outdated

Comment thread .github/workflows/launch_infrastructure.yaml

Add bot pull request feedback

1736636

coderabbitai Bot previously approved these changes Mar 16, 2026

View reviewed changes

Add fixes after manually applying "up"

f6c3a08

Copilot AI review requested due to automatic review settings March 17, 2026 14:15

forstmeier dismissed coderabbitai[bot]’s stale review via f6c3a08 March 17, 2026 14:15

Copilot started reviewing on behalf of forstmeier March 17, 2026 14:15 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread infrastructure/storage.py

Comment thread infrastructure/storage.py

Comment thread infrastructure/storage.py

Add pytz dependency to portfolio_manager to fix alpaca-py missing tra…

15cadd9

…nsitive dependency

forstmeier added the rust Rust code updates label Mar 18, 2026

coderabbitai Bot previously approved these changes Mar 18, 2026

View reviewed changes

Add fixes after manually handling Docker images/ECS services

b99a09e

Copilot AI review requested due to automatic review settings March 18, 2026 01:08

forstmeier dismissed coderabbitai[bot]’s stale review via b99a09e March 18, 2026 01:08

Copilot started reviewing on behalf of forstmeier March 18, 2026 01:09 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

coderabbitai Bot approved these changes Mar 18, 2026

View reviewed changes

chrisaddy approved these changes Mar 18, 2026

View reviewed changes

chrisaddy merged commit 1daa616 into master Mar 18, 2026
12 checks passed

chrisaddy deleted the docker-image-handling-updates branch March 18, 2026 18:59

github-project-automation Bot moved this from In Progress to Done in Overview Mar 18, 2026

This was referenced Mar 19, 2026

Migrate to Prefect Cloud with optimized TiDE model #796

Merged

Add internal scheduler to portfolio manager #803

Merged

coderabbitai Bot mentioned this pull request Apr 3, 2026

Cleanup ECR resources and Docker image handling #813

Merged

coderabbitai Bot mentioned this pull request Apr 15, 2026

Update ECR/Docker naming conventions and enable cost anomaly detection #825

Merged

Conversation

forstmeier commented Mar 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Context

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

forstmeier commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

greptile-apps Bot commented Mar 16, 2026 •

edited

Loading